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AUDIO SIGNAL PROCESSING APPARATUS AND SIGNAL PROCESSING 
METHOD OF THE SAME 

5 BACKGROUND OF THE INVENTION 

1 . Field of the Invention 

The present invention relates to an audio 
signal processing apparatus and a signal processing 
method capable of changing a reproduction speed of an 
10 audio signal without changing a pitch and capable of 

easily realizing a change of the reproduction speed by a 
small amount of calculations . 

2 . Description of the Related Art 

15 in order to convert the reproduction speed of 

an audio signal (including a voice signal and a sound 
signal, hereinafter, simply referred to as an audio 
signal) without changing the pitch, it is necessary to 
perform a wide range of cross-correlation calculations on 

20 the audio signal. Further, it is necessary to calculate 
in advance a framework for enabling flexible parameter 
interpolation of the audio signal, that is, a parametric 
expression of an audio signal. 

As a decoder for audio encoding performing 

25 forward prediction, there is a code excited linear 



prediction (CELP) decoder. Figure 7 is a block diagram of 
an example of the configuration of a CELP decoder. As 
shown in the figure, the CELP decoder comprises an 
adaptive code book 10, a gain code book 20, a stochastic 
5 code book 30, buffers 40 and 50, an adder circuit 60, and 
a linear prediction code (LPC) synthesis filter 70. 

In a CELP decoder, residual signals e(n) are 
obtained by adding signals adjusted in amplitude of a 
pitch component e a (n) and a noise component e a (n) . In 
10 accordance with the residual signals e (n) , an audio 

signal S (n) is synthesized by the LPC synthesis filter 
70. 

Summarizing the disadvantage to be solved by 
the invention, in the CELP or other decoder for forward 
15 prediction encoding of the related art, there is a 

disadvantage that the conversion of the audio signal on 
the time axis requires a large amount of computations and 
difficult processing. 

20 SUMMARY OF THE INVENTION 

An object of the present invention is to provide an 
audio signal processing apparatus and a signal processing 
method capable of changing a reproduction speed of an 
audio signal without changing its pitch and capable of 

2 5 changing a reproduction speed of an audio signal by a 
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small amount of calculations by utilizing the pitch 
information of the audio signal and changing a length of 
predictive residual signals while maintaining continuity. 
To attain the above object, according to a first 
5 aspect of the present invention, there is an audio signal 
processing apparatus for reproducing an audio signal 
based on predictive residual signals in decoding of a 
signal encoded by forward prediction on a frame by frame 
basis, comprising an excitation source modifying means 
10 for extending or shortening the predictive residual 
signals on a time axis and a synthesizing means for 
synthesizing an audio signal based on predictive residual 
signals converted by the excitation source modifying 
means . 

15 According to a second aspect of the present 

invention, there is provided an audio signal processing 
apparatus for reproducing an audio signal based on 
predictive residual signals in decoding of a signal 
encoded by forward prediction on a frame by frame basis, 

20 comprising an excitation source modifying means for 

shortening the predictive residual signals by taking out 
first signal from one sub-frame of the predictive 
residual signals and second signal from signal in a 
following sub-frame or for extending the predictive 

2 5 residual signals by connecting data estimated by 



extrapolation to signals of a frame while maintaining the 
pitch and a synthesizing means for synthesizing an audio 
signal based on predictive residual signals converted by 
the excitation source modifying means . 
5 Preferably, the excitation source modifying means 

comprises dividing means for dividing signal of a sub- 
frame into first signal whose length is m (m is integer 
and m<L, L is the length of said sub-frame) and the 
remaining signal whose length is (L-m) as a reference 
10 signal and finding means for finding the closest signal 

of said reference signal from a signal of other sub-frame 
and shortens said predictive residual signals by 
concatenating the first signal and the closest signal. 

15 Preferably, the excitation source modifying means 

comprises a first multiplying means for multiplying the 
reference signal by a first window function; a second 
multiplying means for multiplying signal taken out from 
the other sub-frame by a second window function; and an 

20 adding means for adding results of the first and second 
multiplying means; and concatenates the results of the 
adding means after the first signal taken out from said 
sub-frame to generate one pitch worth of new predictive 
residual signals . 

25 Preferably, the finding means calculates cross- 



correlation values with the reference signal for signal 
of the other sub-frame, cuts out a signal from a position 
where the calculated cross-correlation value becomes the 
largest as the closest signal. 
5 Alternatively, the finding means calculates a square 

error with the reference signal for signal of the other 
sub-frame, cuts out a signal from a position where the 
calculated square error becomes the smallest as the 
closest signal. 

10 Preferably, the excitation source modifying means 

extends the predictive residual signals by a certain 
extension rate by finding a signal having a predetermined 
length from the end of the predictive residual signals of 
a frame and concatenating said signal after the end of 

15 the predictive residual signal to generates new residual 
signals . 

Preferably, the synthesizing means is a linear 
prediction code synthesis filter. 

According to a third aspect of the present 
20 invention, there is provided an audio signal processing 
method for extending or shortening predictive residual 
signals on a time axis in decoding of a signal encoded by 
forward prediction on a frame by frame basis, comprising 
processing for shortening the predictive residual signals 
2 5 by cutting out first signal from signal in a sub-frame of 
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the predictive residual signals and second signal from 
signal in a following sub-frame based on cross- 
correlation while maintaining the pitch or for extending 
the predictive residual signals by connecting data 
5 estimated by extrapolation to signals of a frame so as to 
shorten or extend the signals of one frame and processing 
for synthesizing an audio signal based on such shortened 
or extended predictive residual signals . 

Preferably, the method further comprises shortening 

10 the predictive residual signals by cutting out from the 
predictive residual signals input for every frame m 
number of signals (m is an integer and m<L) out of a 
length L of one pitch from predictive residual signals in 
a previous frame, using the remaining signals (L-m) as 

15 reference signals to cut out the closest signals to the 

reference signals from the predictive residual signals in 
the next frame, and connecting them after the m number of 
signals taken out from the previous frame to generate one 
pitch worth of new predictive residual signals , dividing 

20 a signal of said sub-frame into the first signal whose 

length is m (m is an integer and m<L, L is the length of 
said sub-frame) and the remaining signal whose length is 
(L-m) as a reference signal, finding the closest signal 
of said reference signal from the other sub-frame and 

25 concatenating the first signal and the closest signal. 



Preferably, the method further comprises shortening 
the predictive residual signals by first multiplication 
processing for multiplying the reference signal by a 
first window function; second multiplication processing 
5 for multiplying cut-out signal from the other sub-frame 
by a second window function; and adding processing for 
adding results of the first and second multiplying means 
and connecting the results of the adding processing after 
the first signal cut out from said sub-frame to generate 

10 one pitch worth of new predictive residual signals. 

Preferably, the method further comprises extending 
the predictive residual signals by a certain extension 
rate by finding a signal having a predetermined length 
from the end of the predictive residual signals of a 

15 frame and concatenating said signal the end of the 
predictive residual signals to generates extended 
predictive residual signals . 

BRIEF DESCRIPTION OF THE DRAWINGS 
20 These and other objects and features of the present 

invention will become more clearer from the following 
description of the preferred embodiments given with 
reference to the attached drawings , in which : 

Fig . 1 is a circuit diagram of an embodiment of 
2 5 audio signal processing according to the present 



invention ; 

Figs . 2A and 2B are waveform diagrams showing 
processing when shortening a residual signal e (n) on a 
time axis; 

5 Fig. 3 is a waveform diagram showing processing for 

extending data by extrapolation ; 

Figs . 4A to 4D are waveform diagrams showing 
processing for improving data continuity of residual 
signals to be connected by using a window function; 
10 Fig. 5 is a waveform diagram of processing for 

extending a residual signal e (n) on a time axis by 
extrapolation ; 

Figs . 6A and 6B are waveform diagrams of a method 
for improving continuity of data when extending a 
15 residual signal by using a window function; and 

Fig. 7 is a block diagram of an example of a CELP 
encoded audio signal decoder of the related art. 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 
20 First Embodiment 

To convert a reproduction speed of an audio signal 
without changing its pitch, there are the method of 
signal processing on a time axis, for example, the 
processing method called PICOLA, and the method of 
25 changing a method of interpolation of parameters on a 



frequency axis . The present invention proposes a method 
of signal processing by signal processing on the time 
axis, particularly in a residual signal region, not an 
audio signal region, and a signal processing apparatus 
5 for realizing the method. 

Figure 1 is a circuit diagram of an embodiment of a 
signal processing apparatus according to the present 
invention . 

As shown in the figure, a signal processing 
10 apparatus of the present embodiment comprises an adaptive 
code book 10, a gain code book 20, a stochastic code book 
30, buffers 40 and 50, an adder circuit 60, a linear 
prediction code (LPC) synthesis filter 70, and an 
excitation source modifier 80. 
15 As shown in the figure, an audio signal processing 

apparatus of the present invention is applied to a code 
excited linear prediction (CELP) decoder. This is a 
normal CELP decoder plus the excitation source modifier 
80. 

20 In the audio signal processing apparatus of the 

present invention, the excitation source modifier 80 cuts 
out data or uses extrapolation to shorten or extend the 
data on the time axis in accordance with a residual 
signal e (n) calculated in accordance with a pitch 

25 component e a (n) and a noise component e s (n) in the CELP 
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decoder, whereby it becomes possible to change the length 
of the audio signal on the time axis and convert the 
reproduction speed of the audio signal without changing 
the pitch component. 
5 In the audio signal processing apparatus of the 

present invention, the adaptive code book 10 calculates a 
signal e a (n) indicating a present pitch component 
(hereinafter, simply referred to as a pitch component for 
convenience) in accordance with an index S a of an input 

10 pitch component and outputs the same to the buffer 40. 
Note that, as shown in Fig. 1, the residual signal e(n) 
calculated by the adder circuit 60 is fed-back to the 
adaptive code book 10. Namely, the adaptive code book 10 
is updated in accordance with the fed-back residual 

15 signal e(n) in the same way as in a normal decoder. 

The stochastic code book 30 calculates a signal 
e a (n) indicating a present noise component (hereinafter 
simply referred to as a noise component for convenience) 
in accordance with an index S p of an input noise 

20 component and outputs the same to the buffer 50. 

The gain code book 20 calculates a pitch component 
gain control signal g a and a noise component gain control 
signal g 3 in accordance with an index S g of an input gain 
and outputs them to the buffers 40 and 50, respectively. 

25 The buffer 40 controls an amplitude of the pitch 



component e a (n) by a gain set by the pitch component gain 
control signal g a and supplies a pitch component e al (n) to 
the adder circuit 60 . 

The buffer 50 controls an amplitude of the noise 
5 component e 3 (n) by a gain set by the noise component gain 
control signal g 3 and supplies a noise component e sl (n) to 
the adder circuit 60 . 

Namely, the pitch component e a (n) and the noise 
component e s (n) are controlled in their amplitudes by the 

10 pitch component gain control signal g a and the noise 

component gain control signal g s obtained from the gain 
code book 20. The obtained pitch component e al (n) and 
noise component e al (n) are sent to the adder circuit 60. 
By adding the pitch component e al (n) and the noise 

15 component e al (n) in the adder circuit 60, a residual 

signal e (n) is calculated and output to the excitation 
source modifier 80. 

The excitation source modifier 80 performs 
processing for shortening and extending the residual 

20 signal e(n) on the time axis by cutting or extrapolation 
or other interpolation. Due to this, a residual signal 
e c (n) converted in length on the time axis is obtained 
without changing the pitch. The residual signal e c (n) 
obtained by the excitation source modifier 80 is output 

25 as a drive sound source to the LPC synthesis filter 70, 
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whereby the audio signal S 0 {n) is reproduced. 

The LPC synthesis filter 70 synthesizes and 
reproduces the audio signal in accordance with the 
residual signal e c (n) output by the excitation source 
5 modifier 80 and an LPC coefficient S p input from the 

outside. Since the residual signal extended or shortened 
on the time axis is supplied by the excitation source 
modifier 80, the audio signal S 0 (n) synthesized by LPC 
synthetic filter 70 becomes an audio reproduction signal 
10 which is extended or shortened on the time axis without 
the pitch being changed compared with the original audio 
signal . 

In the present invention, the above adaptive code 
book 10, gain code book 20, stochastic code book 30, and 

15 LPC synthesis filter 70 are the same as those of the CELP 
decoder of the related art. The excitation source 
modifier 80 of the present invention shortens and extends 
the residual signal e (n) on the time axis by cutting or 
extrapolation or other interpolation . 

20 Below, the operation of the excitation source 

modifier 80 will be explained in further detail to 
further clarify the principle and method of processing 
for conversion of the reproduction speed of an audio 
signal in the present invention. 

2 5 The excitation source modifier 80 performs 



processing to extend or shorten a residual signal e (n) on 
the time axis. Below, the shortening a residual signal 
e(n), that is, raising a reproduction speed of an audio 
signal, will be explained by using examples of signal 
5 waveforms . 

Figures 2A and 2B are waveform diagrams showing the 
principle of shortening a residual signal e (n) in the 
excitation source modifier 80. Figure 2A is a view of an 
example of a waveform of a residual signal e (n) . Here, it 

10 is assumed that the residual signal e(n) is a signal 

digitized by a predetermined sampling frequency in the 
audio signal processing apparatus. The sampling frequency 
f a is, for example, 8 kHz. In linear prediction coding 
(LPC) of an audio signal, the audio signal is processed 

15 in units of frames divided on the time axis. For example, 
when one frame has a length of 20 ms and sampling is 
performed at 8 kHz, data of 160 samples can be obtained 
in one frame. Further, in the processing in the 
excitation source modifier 80 of the present invention, 

20 each frame is divided to four sub-frames. Each sub-frame 
has data of 40 samples and a length of 5 ms on the time 
axis . 

Below, the shortening (cutting) of the residual 
signal e(n) shown in Fig. 2A will be explained under the 
25 above conditions. Here, the explanation will be made 



taking as an example the processing for compressing the 
residual signal e(n) to half of its original length on 
the time axis, that is, for doubling the reproduction 
speed . 

5 In a CELP decoder, the pitch of the audio signal is 

found by forward prediction of the audio signal. Namely, 
when cutting in the excitation source modifier 80, the 
pitch is already known. 

Here, the residual signal between frames F is 

10 designated as e (n) (n=0 , 1, 2, 159). The length of 

the pitch of the audio signal is L. The pitch L is 
already known in the frame F. Here, it is assumed that 
L=40. The frame F is further divided to four sub-frames 
fl, f2, f3, and f 4 . 

15 To double the reproduction speed of the audio signal 

means to find a new residual signal e c (n) having an 
unchanged pitch L and half the length of the original 
residual signal on the time axis based on the residual 
signal e (n) . To realize this, the excitation source 

20 modifier 80 of the present embodiment takes out half of 

the data from one pitch worth of data, uses the remaining 
half data as a reference signal to search for the signal 
closest to the reference signal from the next one pitch 
worth of data in the original residual signal , and 

25 combines the found data and the data taken out from the 
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previous pitch to generate one pitch worth of new 
residual data. As a result of such processing, a new 
audio signal doubled in reproduction speed without 
changing the pitch of the original audio signal and 
5 maintaining the characteristics of the original audio 
signal can be reproduced. Note that as the method for 
gauging the degree of approximation with the reference 
signal, it is possible to make a judgement based on a 
cross-correlation value or a square error value. Namely, 

10 the signal closest to the reference signal can be found 
by the judgement criteria of the largest cross - 
correlation value with the reference signal or the 
smallest square error with the reference signal. Here, as 
an example, the square difference (or average square 

15 error) with the reference signal is used as the standard 
and the signal having the least square error is made the 
signal closest to the reference signal. Below, the method 
of audio signal processing of the present embodiment will 
be explained in further detail by taking as an example 

20 the waveform of a residual signal shown in Fig. 2A. 

First, in the first sub-frame fl, data having half 
the length of the pitch L is taken out from an 
appropriate position of the residual signals e(0) to 
e(39) to obtain converted residual signals e c (0) to 

25 e c {19) . Note that the cutting position can be set around 



the position where a peak of the residual signals e (n) 
appears in the first sub-frame f 1 . As a result, a first 
half of one pitch worth of new residual signals e c (n) is 
formed 

5 Next, the second half of the one pitch worth of new 

residual signals e c (n), that is, the residual signals 
e c (20) to e c (39), are obtained. Note that to compress the 
length of an audio signal and to sufficiently maintain 
the characteristics of the original audio signal, the 

10 second half of the one pitch worth of the residual 

signals e c (n) has to be obtained from the next sub-frame 
f 2 . Here, using the left over second half of the one 
pitch worth of the residual signals in the sub-frame f 1 , 
that is, the residual signals e(20) to e(39), as 

15 reference signals e ref (n) , portions giving the smallest 
square error E(i) with respect to the reference signals 
e ref (n) are found from the sub-frame f 2 . This code series 
is used for the second half of the one pitch worth of the 
new residual signals e c (n), that is, the residual signals 

20 e c (20) to e c (39) . The square error E (i) is obtained by 
the following calculation. 

i/2-1 

E(l) - 2 < e -f < n > - *(n + I)) 2 ... (1) 

n-0 

In equation (1), e ref (n) = e(n+20) and x(n) = 



- 17 - 



e(n+40)(n=0, 1, 2, . .., 19). In accordance with equation 

(1) , an error E of each i is obtained, and a value i opt by 

which E(i) becomes the smallest is obtained. Namely, i opt 

is obtained by the next equation. 

± opt = arg min E(l) 

5 2 ...(2) 

= arg min ( e ref( n9 ~ x ( n + ) 

In equation (2) , "argmin" is an operator indicating 
a value of i when the latter equation gives the smallest 
value . 

10 By the calculated i opt , 20 pieces of data are cut out 

from the i opt -th data from the top of the sub-frame f2 to 
make new residual signals e c (20) to e c (39) . Namely, using 
the signals e (n) of the latter half of the sub-frame f 1 
as reference signals e ref (n), the signals closest to the 

15 reference signals e cef (n) are found from the sub-frame f2 
and joined to the second half of the one pitch worth of 
the new residual signals e c (n) generated. 

Here, for example, it is assumed i opt =15 as a result 
of the calculation based on equation (2) . Therefore, 20 

20 continuous pieces of data are taken out from the 15th 

residual signal data in the sub-frame f2 and used for the 
second half of the one pitch worth of the new residual 
signals e c (n) . Namely, data e c (20) to e c (39) are comprised 
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of e(35) to e(54), respectively. 

From the above processing, one pitch worth of data 
of the new residual signals, that is, the residual 
signals e c (0) to e c (39) , is obtained. Figure 2B is a 
5 waveform diagram of the thus calculated residual signals 
e c (n) . 

Next, the second pitch worth of the residual signals 
e c {n) (n = 41, 42, 79) are obtained. First, half of 

a pitch worth of the residual signals e(n) are taken out 

10 from an appropriate portion, for example, a peak position 
or its surroundings, of the residual signals e(n), to 
obtain a first half of the second pitch worth of the new 
residual signals e c (n) . 

Using the residual signals corresponding to half of 

15 the one pitch worth of data from the tail end of the data 
taken out in the residual signals e (n) as reference 
signals e ref (n), the data closest to the reference signals 
e ref (n) are searched for from the fourth sub-frame f4 of 
the original residual signals e (n) . Then, as explained 

20 above, a square error of the reference signals and the 

residual signals is obtained as shown in equation (1) as 
a criteria for measuring a degree of approximation with 
the reference signals . Assuming a position where the 
square error becomes the smallest to be i opt , half a pitch 

25 worth of data are taken out from the i opt and used as the 



second half of the one pitch worth of the new residual 
signals e c (n) . 

Here, assuming the number of sampling data per pitch 
is 1^ and the number of data per frame is N, when 
5 i opt +L 1 /2>N, the residual signals e(0) to e(N-l) of one 

frame are not sufficient to form the new residual signals 
e c (n) . Data after the residual signal e(N-l) becomes 
necessary. In an actual audio signal precessing 
apparatus , since an audio signal is input in units of 

10 frames, the data of the next frame is sometimes still not 
ready while the audio encoded data of a first frame is 
being processed. In this case, the portion of the data 
over one frame has to be estimated from the one frame of 
data being processed by extrapolation etc . 

15 Extrapolation takes note of the fact that audio data 

has continuity in a certain time period. It uses one 
pitch worth of data going back from the tail end of one 
frame as an estimated value and connects this to the tail 
end of the frame to make up for the gap. Figure 3 is a 

20 waveform diagram showing the processing for compensating 
for data in residual signals of one frame by 
extrapolation . 

As shown in the figure, when using extrapolation, 
one pitch worth 1^ of data is cut out from a position 

2 5 reached by going backward by one pitch L x from the tail 
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end (position where n=N) of one frame of data. The L 2 
amount of data is added after the frame so as to fill the 
gap in the data. Further, in accordance with need, the 
cut out one pitch worth of data may be added one more 
5 time . 

The string of data e x (n) (n^N) compensated for by 
the above extrapolation can be expressed by the next 
equation : 

10 E x (n)=e(n+N-L 1 ) ...(3) 

When a gap arises in the residual signals e(0) to 
e (N) of one frame, the gap in data can be filled by 
extrapolation and that new data used to produce new 

15 residual signals e c (n). 

Note that when extrapolating data, to eliminate 
discontinuity of data at joined portions, it is effective 
to apply a window function to the portion around the 
joined data and add that joined data. 

20 In the above reproduction method of a residual 

signal e c (n) , to generate one pitch worth of data, the 
first half of the data is generated by using the first 
half of one pitch worth of the original residual signals, 
while the second half of the data is generated by using 

2 5 the second half of the one pitch worth of the original 



residual signals are used as reference signals, finding 
the code string closest to the reference signals from the 
second pitch worth of data of the original residual 
signals, and using the closest signals as the second half 
5 in the one pitch worth of the new residual signals . As 

the criteria for gauging the degree of approximation with 
the reference signals , the square error is calculated and 
the signals giving the smallest square error are found. 
Namely, each pitch worth of data in the new residual 

10 signals e c (n) are obtained by joining data from different 
pitch section as their first half and second half, so 
discontinuity arises at the joined portions of data in 
some cases . If reproducing an audio signal based on 
residual signals e c (n) by an LPC synthesis filter, the 

15 discontinuity of the residual signals can be reduced to 
some extent. To further eliminate the discontinuity, new 
residual signals e c (n) are generated for the starting 
part of the second half of the data by applying a window 
function to the reference signals e ref (n) and cut-out 

20 signals and adding them. 

As a window function, it is possible to use the 
usually frequently used triangle window. Figures 4A to 4D 
are waveform diagrams of the joining of residual signal 
data by using a triangle window. 

25 Figure 4A is a waveform diagram of original residual 



signals e(n) . Figure 4B is a waveform diagram of new 
residual signals e o {0) to e c {L 1 /2-l) formed by the codes 
e(0) to e (Li/2-1) of half of one pitch cut out from the 
residual signals e(n) . Using the second half data of that 
5 one pitch of the residual signals e (n) as reference codes 
e ref (n) , a position i opt giving the smallest square error 
E(i) is calculated. Data of an amount of L x /2 is cut out 
from the i opt th data in the second pitch worth of the 
original residual signals e (n) . 

10 As explained above, by connecting the cut-out L x /2 

amount of data after the residual signals e c (0) to 
e c (L 1 /2), one pitch worth of residual signals e c (n) can be 
generated. However, discontinuity sometimes occurs in the 
residual signals e c (n) generated by such simple 

15 connection. To deal with this, the triangle window 

functions T x (n) and T 2 (n) shown in Fig. AC are applied to 
the reference signals e ref (n) and the cut-out signals and 
the results added to obtain the second half data in one 
pitch worth of the residual signals e c (n) . Figure 4D is a 

20 waveform diagram of one pitch worth of residual signals 
generated by connecting first half data and second half 
data of one pitch by operation using the triangle window 
functions . 

Note that processing for application of the triangle 
25 window functions can be realized by a simple 



multiplication operation using a variable X in accordance 
with the position of the residual signals as shown in the 
next equation: 



(1 - A)e„,(n) + Ae(i opt + n) 

* = n/ f' ...,«> 

0 < n < L / 2 
e<*opt + n) (Z / 2 £ n < N' ) 

As explained above, by applying window functions to 
the reference signals and the cut-out signals and adding 
the results to form the residual signals e c (n) , it is 

10 possible to improve the continuity of data at the joined 
portions of the residual signals e c (n) generated. 

In the above explanation, a signal processing method 
for increasing the reproduction speed of an audio signal 
was explained. When lowering the reproduction speed of an 

15 audio signal, in a reverse way to the above processing, 
it is necessary to extend the residual signals e(n) on 
the time axis without changing the pitch. Namely, 
processing is performed for increasing the amount of data 
of the residual signals e (n) , for example, by 

2 0 extrapolation, while maintaining the length of the pitch. 

When estimating data by extrapolation, note is taken 
of the continuity of an audio signal. Using as an unit 



ejn) - 
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the length of a pitch, one pitch worth of data is cut out 
each time from the tail end of one frame of data. Then, 
the cut-out string of data is connected after the last 
data in one frame. If necessary, one pitch worth of data 
5 another pitch before the first cut-out position may be 
cut out and connected to the tail end of the data 
extrapolated the first time. 

Figure 5 is a waveform diagram of an example of 
extension of residual signals e (n) , for example, when 
10 extending an original audio signal 1.5 fold on the time 
axis . 

As shown in the figure, in this example, four 
pitches' worth of data of residual signals are fit in one 
frame. Namely, when setting a length of one frame as N 
15 and a length of a pitch as L x (N=4L 1 ) , it is necessary to 
one frame of code data by two pitches' worth of data in 
order to extend the residual signals e (n) 1.5-fold on the 
time axis . 

The waveform in Fig. 5 shows a method of increasing 
20 the residual signal e(n) by extrapolation. Here, the last 
one pitch worth of data is cut out from the four pitches' 
worth of data in one frame. Then, the string of cut-out 
data is connected twice to the tail end of the frame. As 
a result of the extrapolation, two pitches' worth of 
25 residual signals e (N) to e(N+2L 1 -l) are further added to 



the N number of data e(0) to e(N-l) in one frame. Namely, 
new residual signals e c (n) including (N+2L X ) number of 
data are generated for the original one frame worth of N 
number of data. Since the residual signals e c (n) have an 
5 unchanged pitch length from the original residual signals 
e (n) , by generating an audio signal by an LPC synthesis 
filter by using the converted residual signals e c (n) , an 
audio signal extended 1 . 5-fold on the time axis can be 
reproduced without changing the pitch. 

10 Note that the extrapolation of the residual signals 

e (n) is not limited to the above method. For example, 
when extending original residual signals e (n) shown in 
Fig. 5 1.5-fold on the time axis, it is possible to cut 
out two pitches' worth of data from the tail end of the 

15 frame of the original one frame worth of residual signals 
and join that cut-out data to the end of the frame. As a 
result, residual signals e Q (n) extended 1.5-fold from the 
original signals are obtained without changing the pitch. 
By generating an audio signal by an LPC synthesis filter 

2 0 using the new residual signals e c (n) , an audio signal 
extended 1.5-fold on the time axis can be reproduced 
without changing the pitch. 

Note that the above extension of residual signal 
data by extrapolation simply connects a cut-out string of 

2 5 data to the end of the original data, so discontinuity 
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sometimes arises at the joined portions of data in the 
new residual signals e c (n). If reproducing an audio 
signal based on residual signals e c (n) by an LPC 
synthesis filter, the discontinuity of the residual 
5 signals can be reduced to some extent. To further 

eliminate the discontinuity, it is possible to apply a 
window function to the data of the joined portions of the 
residual signals and add them. 

Figures 6A and 6B are views of processing for 

10 connection by using as a window function a triangle 

window function having a length of m. Figure 6A shows an 
example of a waveform of the residual signals e (n) . As 
shown in the figure, a data string longer by m (nKLj.) 
than the one pitch length 1^ is cut out at the time of 

15 cutting. Then, the triangle window function f r (n) shown 
in Fig. 6B is applied to the m number of data at the top 
of the cut-out data. On the other hand, triangle function 
f 2 (n) shown in Fig. 6B is applied to the last m number of 
data in the data of the original one frame of residual 

2 0 signals e (n) . The data obtained by adding the results of 
application of the window functions is connected to a 
position m number of data before the end of the frame of 
the residual signals e (n) . L x number of data continuing 
from the first m number of cutout data string is 

25 connected thereafter. 



As explained above, one pitch worth of data can be 
extrapolated after the one frame worth of data . 
Furthermore, when connecting one pitch worth of data 
after the extrapolated data, it is sufficient to add data 
5 to which window functions have been applied in the same 
way as explained above . 

As explained above, by using triangular windows to 
apply window function to a predetermined number of data 
after the top of the cut-out data and after one frame of 

10 data, adding the results, and connecting them as data of 
new residual signals e c (n), discontinuity of data 
generated by simple cutout and connection can be 
suppressed and the continuity of an audio signal 
reproduced by an LPC synthesis filter based on the 

15 residual signals e c (n) can be improved. 

As explained above, according to the present 
invention, by shortening or extending residual signals on 
a time axis while maintaining pitch information and 
synthesizing an audio signal by an LPC synthesis filter 

20 based on the generated new residual signals, an audio 
signal compressed or expanded on the time axis can be 
reproduced without changing the pitch. Namely, a 
reproduction speed of an audio signal can be raised and 
lowered without changing the pitch. 

2 5 Note that the above embodiment is an example where 



- 28 - 



the present invention was applied to a CELP decoder . 
Needless to say, the processing for conversion of the 
reproduction speed of an audio signal of the present 
invention is not limited to applications using a CELP 
5 decoder. The invention may be applied to other audio 

signal processing apparatuses handling residual signals 
including pitch information of an audio signal based on 
the same principle . 

Summarizing the effects of the invention, as 
10 explained above, according to an audio signal processing 
apparatus and processing method of the present invention, 
it is possible to freely change a reproduction speed of 
an audio signal without changing the pitch of the audio 
signal . 

15 Furthermore, when connecting data by extrapolation 

etc . , by applying window functions to data around the 
connection portions and adding the results, it is 
possible to reduce the discontinuity of the joined 
portions of the connected data, maintain the continuity 

20 of the reproduced audio signal, and improve the quality 
of sound . 

Note that the embodiments explained above were 
described to facilitate the understanding of the present 
invention and not to limit the present invention. 
25 Accordingly, elements disclosed in the above embodiments 
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include all design modifications and equivalents 
belonging to the technical field of the present 
invention . 



